This is the second part of a two-part discussion (Part 1 appeared in August) in which the author considers fault-coverage analysis and simulation for full-scan testing of ASIC designs. These elements are equally applicable to the design of other IC types, such as FPGAs.
Scan testing is used extensively by ASIC vendors to detect manufacturing defects that are generally called stuck-at faults. The purpose of fault-coverage analysis is to accurately calculate the percentage of the total faults that are detectable and to come up with ways to cover any uncovered faults so that the final coverage percentage reaches 95 percent or higher. That serves as the standard used by ASIC vendors to prove that the circuit is physically manufactured correctly.
In the first part Chang discussed practical methods to systematically analyze stuck-at faults not covered by full scan, providing real-world design examples. The discussion centered on important DFT rules that, if followed, greatly increase fault coverage.
In this second part, Chang details how fault simulation is used to simulate an ASIC's non-scan test vectors for fault-coverage rather than logic-verification purposes. Please refer to Part 1 for appropriate figures.
DFT Rule 1 (ISD Magazine, August 2001, page 46) and Figure 13 say that you should either boundary-scan or add test points around embedded RAM modules for full scan testing, a design change. But what if that cannot be done, for reasons of schedule, performance, real estate, etc.? And how would you cover the non-scan logic blocks in Figure 4? Most of the untestable faults may very well be in those blocks. In that case, it is common to try fault simulation, which uses a special simulator (called a fault simulator) that supports stuck-at fault models and simulates non-ATPG test vectors for fault coverage purposes. The non-ATPG vectors can be either those that are originally developed for functional verification or those that are manually developed to target known faults. This is shown in Figure 14.
Fault simulation should never start from scratch. It should read in the fault coverage result provided by ATPG tool and add to it any additional coverage provided by non-ATPG test vectors. Therefore, it is of paramount importance that your fault simulator understand the output files of your ATPG tool. One way to ensure that is to use both tools from the same vendor. If they are from different vendors, make sure one can talk to the other before buying them. Another important concern is that some fault simulators cannot directly use your testbench (say, in Verilog) and require the input stimulus and output response files to be in specific formats, such as the binary format of ones and zeros.
You need a tool to do this translation. It might be available from your ASIC vendor; most likely you must develop it using programming languages like C, C++ or Perl5. Make sure enough resources are allocated for this nontrivial task.
Functional test vectors in Figure 14 consist of several different kinds of test vectors. The first step in fault simulation is to decide which kinds will provide the most coverage and to simulate only those. They are:
- Logic test vectors, developed by the ASIC designer or DVT engineer or both to functionally verify the design.
- BSR test vectors, which are created by the ASIC vendor using the BSDL files. If every primary I/O pin is in the boundary-scan chain, these vectors include parametric test vectors.
- Parametric test vectors, which are generated by the ASIC designer or DVT engineer or both for designs that do not have every primary I/O pin in the boundary-scan chain. They toggle each primary I/O pin to 1, 0 and Z-state.
- TAP controller test vectors, which are developed by the ASIC designer or DVT engineer or both to walk TAP logic through every required state and scan instruction as well as to test the boundary-scan logic. Some faults in the TAP logic should be covered by scan testing itself, such as those on clock nets, but others must be tested by these vectors.
Experience shows that logic test vectors are not very effective for fault coverage, since they do not specifically target faults. But by chance they can propagate some faults, either covered or not covered by ATPG, to the primary outputs. If they are run on the chip tester anyway, use fault simulation to find out whether they happen to detect some uncovered faults. If yes, then you do not have to manually write test vectors to cover those faults, which requires a thorough understanding of the design and much effort.
BSR vectors are very hard to use for fault simulation. The reason is that the boundary-scan test tool of the ASIC vendor reads only the BSDL files and generates BSR vectors based solely on the knowledge of primary I/O pins: It does not know anything about the ASIC's internal logic. A boundary-scan chain consisting of two BSRs is shown in Figure 15 for a tristatable primary-output pin PO.
To run fault simulation on non-ATPG test vectors, make sure the simulator generates the expected output responses that come with those vectors; there must be no mismatch whatsoever. In the BSR case, we could figure out from the values of the TMS signal in BSR vectors when the boundary-scan chain shifting ends and could compare the output responses at that time rather than during shifting. Fault simulators allow this also. But this is not automatic (to monitor the values on TMS pin). In addition, BSR vectors are generated with no knowledge of the ASIC's internal logic, which we are interested in fault grading, so it is unlikely that they will dramatically increase the fault coverage. For those reasons, BSR vectors are usually not used.
The second step in fault simulation is to understand the differences in the target faults seen by ATPG and by fault simulation, because the latter has to start with the fault coverage result of the former. Those differences come from the following reasons:
- RAM BIST modules are modeled as black boxes for ATPG but are modeled as real logic in fault simulation. So fault simulation sees all the faults inside BIST modules not seen by ATPG. This is fine, and you can tell the fault simulator not to target faults in those modules.
- The netlist used for fault simulation may not be the same as the one used by ATPG. For example, when ATPG is run, no in-place optimization has been performed for the design yet, because it is very early in the back-end process. After a design is placed and routed, gate and net delays are generated. Based on those values, some gates may have to be beefed up; minor design changes may even have to be made, such as the example in Figure 16. In this case, you can either rerun ATPG on the modified netlist or let fault simulation cover those extra faults. Note that mismatches like this should always be reported to the ASIC designer to make sure they are intended.
- During ATPG testing, scan flops are of the type muxscan; their clk, d and si pins are automatically tested (but the qn pin is not). In fault simulation, those pins become logic pins and are targeted by the fault simulator. You have to tell it not to target the pins.
- As discussed before, if the gate-level netlist uses Verilog assign statements they are translated by ATPG into pseudo buffer cells in its database. Those cells are not in the database of the fault simulator.
- Sometimes, such as when delay cells are used, the design netlist has to be modified just for the fault simulator. In Figure 17, instances del_1 and del_2 are such cells. It turns out that the circuitry between clk and we_ram_unbuf signals, and the waveforms it generates, cannot be directly simulated in the fault simulator. That's because we_ram_unbuf's correct waveform depends on the delays through the two delay cells, but no delay information can be modeled. Fault simulation is not unique in that regard; for example, ASIC emulation and testing systems have the same limitation.
What we can do in fault simulation is bypass the circuitry between clk and we_ram_unbuf signals in Figure 17 (a) by feeding the waveform of we_ram_unbuf to pin a3 of instance we_logic directly from an added primary input pin called we_ram_unbuf_fs (_fs stands for fault simulation). This is shown in Figure 18. If the design has multiple RAMs, pin we_ram_unbuf_fs can be fed to all of them. Note that this pin and the multiplexer are added only for fault simulation and not for chip fabrication.
In summary, you must be aware of these netlist differences between ATPG and fault simulation and decide what you need to target in fault simulation. After the fault simulation database (which includes the coverage result of ATPG) is built, the fault simulator can be used to generate an initial fault coverage that it sees. This may be different from that seen by ATPG, depending on the number of new faults that show up and of the faults you decide not to target. In addition, there should be far fewer untestable faults in fault simulation than in ATPG because RAMs are modeled as gray boxes and test data can now be propagated through to test all the non-scan logic that directly accesses the RAMs.
Also, non-scan flops are no longer assigned model types of ALWAYS0 or ALWAYS1. In fact, some types of untestable faults are totally gone, such as CONS_UNCNTRL_UNTESTABLE and CONS_UNOBSRV_UNTESTABLE faults (see Table 2). This is because no primary I/O pins are constrained to constant values.
However, there is one type of untestable fault that is reported only by fault simulation but not by ATPG: CLOCK_FAULT_UNTESTABLE. It occurs mostly on the clock pin of storage elements (flops and latches) and makes it impossible to initialize them, causing unknown values to be propagated to the outputs.
An example is stuck-at-1 fault on pin en of instance latch1 in Figure 19. CLOCK_FAULT_UNTESTABLE faults occur also on nonclock pins, such as stuck-at-0 fault on pins of instance mux in Figure 20.
There is no CLOCK_FAULT_UNTESTABLE fault at all in ATPG on any of the pins we just discussed. So what kind of faults do those pins have in ATPG? All faults on instance mux in Figure 20 are DANGLING_UNTESTABLE because instance register1, a non-scan flop, is assigned the model type of ALWAYS0. Instance latch1 in Figure 19 is inside a BIST module and not seen by ATPG. CLOCK_FAULT_UNTESTABLE faults on clock and enable pins can at best become potentially detected. Although the output values are nondeterministic, they are typically detected on the tester because eventually the clock fault causes an output to differ from the expected value. Therefore, exclude such faults from fault simulation.
The last step in fault simulation is also the most difficult: the translation of non-scan test vectors into the format required by the fault simulator. There is no commercially available translator; you have to write your own.
Some ASIC vendors use a 24-logic value system to describe the test vectors that are run on the chip, such as that shown in Table 5. The first column, Static, means that this pin-input, output or bidirectional-has held its value throughout the current simulation cycle. The second column, Dynamic, means that the pin has changed from a previous value to the current one during the current cycle, and that time is given in the test vector file. The previous value is the one used in the previous cycle.
For example, if an input pin has a stimulus value of 1, it is a 1 throughout the current simulation cycle with no change; if it has a stimulus value of /, during the current cycle the value changed from that of the previous cycle to the current value of 1 and will remain a 1 for the rest of the current cycle.
In Table 5, “charged input” means input pad cells with repeaters that hold their values if not driven. This does not happen on a chip tester, so values Q, R and T are treated as U and values q, r and t as u. The term “3-stated output” means output pad cells with repeaters that hold their values if not driven and are currently tristated. This does not happen on the chip tester, so values C and E are treated as Z and values c and e as z. The values a pin or signal takes on can indicate its type. For example, clock and pure input signals take on values of 0, 1, U, o, / and u. Control signals to tristatable and bidirectional pins and pure output signals take on values of H, L, X, h, l and x. Tristatable pins take on values of C, E, Z, c, e and z, and, finally, bidirectional pins take on values of input pins if they serve as input and output pins if they serve as output. Some may wonder what an “unknown” value (such as U or u) means for an input pin on the chip tester. For most ASIC vendors, it means the pin is just left floating-that is, not driven.
The big challenge is to match this 24-logic value system to the four-logic value system of 1, 0, X and Z that is used by the fault simulator. After careful study, the two systems are matched in Table 6-but this is not even the entire picture. For example, for a tristatable or bidirectional pin or both, if its control signal is X or changes to X the pin's value can be one of C, E, Z, c, e and z in the 24-logic value system, but that value must be X or change to X in fault simulation. This is the way most logic and fault simulators work if the control signal is X. If the control signal is not X and the tristatable or bidirectional pin or both is one of C, E, Z, c, e and z, then the value is translated to Z in fault simulation. Subtleties of this kind must be fully understood and properly handled. Otherwise, fault simulation will have output response mismatches. Another difference between the two systems is that the 24-logic value vectors contain internal control signals for tristatable and bidirectional pins. The signals cannot go into the stimulus and the response files for the fault simulator, but they tell your translator when tristatable pins take on Z values and when bidirectional pins change direction so that those two files can be correctly generated.
It is not a trivial task to write this translator, because the differences between the two systems are simply too numerous and very subtle, but it can and must be done. The goal is no output mismatch when the test vectors are translated and used by the fault simulator because the key to fault simulation is to propagate the effects of internal faults to the primary outputs for observation. If there is even a single output mismatch, the fault simulation is not valid, since this means that simulation vectors do not exercise the design in the same way the chip is tested on the chip tester. Fault simulation can start after correct input stimulus and output response files are generated. After each test is run, mark the faults that it detects so that the next test does not target them again. That way, each test should reduce the number of remaining uncovered faults. After all available non-scan tests are run, most likely there still will be some uncovered faults. If they are in blocks that must be covered, more test vectors must be developed manually to target those faults. This could be a time-consuming and iterative process. However, practical and proven methodologies and useful tools are available from many sources, such as fault-simulation-tool vendors.
In conclusion, scan-testing problems in an ASIC design are not trivial. They can cost you a lot of time as well as resources at the back end if they are not carefully thought out at the front end. Therefore, it is important to handle them as design matters rather than as entities isolated from the design. It is equally important that ASIC designers keep abreast of new techniques in scan testing because the techniques may provide high fault coverage in a much easier way and may be used together with existing scan testing methodologies.